8 research outputs found
Aspectos matemáticos da entropia
Com o presente trabalho pretende-se fazer uma abordagem
ao desenvolvimento do conceito Entropia.
Inicia-se esta dissertação com a apresentação de uma
perspectiva histórica da evolução do conceito.
Enunciam-se as propriedades mais importantes das entropias
de Shannon, Rényi e de Tsallis, assim como do ganho
de informação de Kullback-Leibler. Apresentam-se as demonstrações
de algumas destas propriedades.
São expostas e analisadas as definiçõpes axiomáticas da
entropia de Shannon e de Tsallis. Menciona-se ainda a definição
axiomática do ganho de informação de Rényi, que
conduz à definição da entropia de Rényi.
ABSTRACT: This work intends to be an approach of the development of
the concept of Entropy.
We start with an historical perspective of the evolution of
this concept.
Several properties of the Shannon, Rényi and Tsallis entropies
are presented, likewise Kullback-Leibler’s gain of information.
Some of this properties are followed by it’s proof.
We present and discuss the axiomatic definition of Shannon
and Tsallis entropies. We also refer axiomatic definition of
Rényi’s gain of information, which defines Rényi’s entropy
Análise de distribuições de distâncias entre palavras genómicas
The investigation of DNA has been one of the most developed areas of
research in this and in the last century. However, there is a long way to go
to fully understand the DNA code. With the increasing of DNA sequenced
data, mathematical methods play an important role in addressing the need
for e cient quantitative techniques for the detection of regions of interest
and overall characteristics in these sequences.
A feature of interest in the study of genomic words is their spatial distribution
along a DNA sequence, which can be characterized by the distances between
words. Counting such distances provides discrete distributions that may
be analyzed from a statistical point of view. In this work we explore the
distances between genomic words as a mathematical descriptor of DNA
sequences. The main goal is to design, develop and apply statistical methods
specially designed for their distributions, in order to capture information
about the primary and secondary structure of DNA.
The characterization of empirical inter-word distance distributions involves
the problem of the exponential increasing of the number of distributions
as the word length increases, leading to the need of data reduction.
Moreover, if the data can be validly clustered, the class labels may provide
a meaningful description of similarities and di erences between sets of
distributions. Therefore, we explore the inter-word distance distributions
potential to obtain a word clustering, able to highlight similar patterns
of word distributions as well as summarized characteristics of each set of
distributions.
With the aim of performing comparative studies between genomic sequences
and de ning species signatures, we deduce exact distributions of inter-word
distances under random scenarios. Based on these theoretical distributions,
we de ne genomic signatures of species able to discriminate between species
and to capture their evolutionary relation. We presume that the study of
distributions similarities and the clustering procedure allow identifying words
whose distance distribution strongly di ers from a reference distribution or
from the global behaviour of the majority of the words. One of the key topics
of our research focuses on the establishment of procedures that capture
distance distributions with atypical behaviours, herein referred to as atypical
distributions.
In the genomic context, words with an atypical distance distribution may
be related with some biological function (motifs). We expect that our
results may be used to provide some sort of classi cation of sequences,
identifying evolutionary patterns and allowing for the prediction of functional
properties, thereby contributing to the advancement of knowledge about
DNA sequences.A investigação do ADN é uma das áreas mais desenvolvidas neste e no
último século. O crescente aumento do número de genomas sequenciados
tem exigido técnicas quantitativas mais e cientes para a identi cação de
características gerais e especí cas das sequências genómicas, os métodos
matemáticos desempenham um papel importante na resposta a essa
necessidade.
Uma característica com particular interesse no estudo de palavras genómicas
é a sua distribuição espacial ao longo de sequências de ADN, podendo
esta ser caracterizada pelas distâncias entre palavras. A contagem dessas
distâncias fornece distribuições discretas passíveis de análise estatística.
Neste trabalho, exploramos as distâncias entre palavras como um descritor
matemático das sequências de ADN, tendo como objetivo delinear e
desenvolver procedimentos estatísticos especialmente concebidos para o
estudo das suas distribuições.
A caracterização das distribuições de distâncias empíricas entre palavras
genómicas envolve o problema do crescimento exponencial do número
de distribuições com o aumento do comprimento da palavra, gerando a
necessidade de redução dos dados. Além disso, se os dados puderem
ser validamente agrupados em classes então os representantes de classe
fornecem informação relevante sobre semelhanças e diferenças entre cada
grupo de distribuições. Assim, exploramos o potencial das distribuições de
distâncias na obtenção de um agrupamento de palavras, que agrupe padrões
de distâncias semelhantes e que coloque em evidência as características de
cada grupo. Com vista ao estudo comparativo de sequências genómicas e
à de nição de assinaturas de espécies, focamo-nos no desenvolvimento de
modelos teóricos que descrevam distribuições de distâncias entre palavras em
cenários aleatórios. Esses modelos são utilizados na de nição de assinaturas
genómicas, capazes de discriminar entre espécies e de recuperar relações
evolutivas entre estas. Presumimos que o estudo de semelhanças e a
análise de agrupamento das distribuições permite identi car palavras cuja
distribuição se afasta fortemente de uma distribuição de referência ou do
comportamento global das maioria das palavras. Um dos principais tópicos
de investigação foca-se na deteção de distribuições com comportamentos
anormais, aqui referidas como distribuições atípicas.
No contexto genómico, palavras com distribuições de distâncias atípicas
poderão estar relacionadas com alguma função biológica (motivos).
Esperamos que os resultados obtidos possam ser utilizados para fornecer
algum tipo de classi cação de sequências, identi cando padrões evolutivos e
permitindo a previsão das propriedades funcionais, representando assim um
passo adicional na criação de conhecimento sobre sequências de ADN.Programa Doutoral em Matemátic
Characterization of a large cluster of HIV-1 A1 infections detected in Portugal and connected to several Western European countries
HIV-1 subtypes associate with differences in transmission and disease progression. Thus, the existence of geographic hotspots of subtype diversity deepens the complexity of HIV-1/AIDS control. The already high subtype diversity in Portugal seems to be increasing due to infections with sub-subtype A1 virus. We performed phylogenetic analysis of 65 A1 sequences newly obtained from 14 Portuguese hospitals and 425 closely related database sequences. 80% of the A1 Portuguese isolates gathered in a main phylogenetic clade (MA1). Six transmission clusters were identified in MA1, encompassing isolates from Portugal, Spain, France, and United Kingdom. The most common transmission route identified was men who have sex with men. The origin of the MA1 was linked to Greece, with the first introduction to Portugal dating back to 1996 (95% HPD: 1993.6-1999.2). Individuals infected with MA1 virus revealed lower viral loads and higher CD4+ T-cell counts in comparison with those infected by subtype B. The expanding A1 clusters in Portugal are connected to other European countries and share a recent common ancestor with the Greek A1 outbreak. The recent expansion of this HIV-1 subtype might be related to a slower disease progression leading to a population level delay in its diagnostic.Supported by FEDER, COMPETE, and FCT by the projects NORTE-01-0145-FEDER-000013, POCI-01-0145-FEDER-007038 and IF/00474/2014; FCT PhD scholarship PDE/BDE/113599/2015; FCT contract FCT IF/00474/2014; European Funds through grant BEST HOPE (project funded through HIVERA, grant 249697) and by FCT PTDC/DTP-EPI/7066/2014. Global Health and Tropical Medicine Center are funded through FCT (UID/Multi/04413/2013). We would like to acknowledge all the patients and health care professionals from the Portuguese hospitals that contributed in some way to this study
Characterisation of microbial attack on archaeological bone
As part of an EU funded project to investigate the factors influencing bone preservation in the archaeological record, more than 250 bones from 41 archaeological sites in five countries spanning four climatic regions were studied for diagenetic alteration. Sites were selected to cover a range of environmental conditions and archaeological contexts. Microscopic and physical (mercury intrusion porosimetry) analyses of these bones revealed that the majority (68%) had suffered microbial attack. Furthermore, significant differences were found between animal and human bone in both the state of preservation and the type of microbial attack present. These differences in preservation might result from differences in early taphonomy of the bones. © 2003 Elsevier Science Ltd. All rights reserved
NEOTROPICAL ALIEN MAMMALS: a data set of occurrence and abundance of alien mammals in the Neotropics
Biological invasion is one of the main threats to native biodiversity. For a species to become invasive, it must be voluntarily or involuntarily introduced by humans into a nonnative habitat. Mammals were among first taxa to be introduced worldwide for game, meat, and labor, yet the number of species introduced in the Neotropics remains unknown. In this data set, we make available occurrence and abundance data on mammal species that (1) transposed a geographical barrier and (2) were voluntarily or involuntarily introduced by humans into the Neotropics. Our data set is composed of 73,738 historical and current georeferenced records on alien mammal species of which around 96% correspond to occurrence data on 77 species belonging to eight orders and 26 families. Data cover 26 continental countries in the Neotropics, ranging from Mexico and its frontier regions (southern Florida and coastal-central Florida in the southeast United States) to Argentina, Paraguay, Chile, and Uruguay, and the 13 countries of Caribbean islands. Our data set also includes neotropical species (e.g., Callithrix sp., Myocastor coypus, Nasua nasua) considered alien in particular areas of Neotropics. The most numerous species in terms of records are from Bos sp. (n = 37,782), Sus scrofa (n = 6,730), and Canis familiaris (n = 10,084); 17 species were represented by only one record (e.g., Syncerus caffer, Cervus timorensis, Cervus unicolor, Canis latrans). Primates have the highest number of species in the data set (n = 20 species), partly because of uncertainties regarding taxonomic identification of the genera Callithrix, which includes the species Callithrix aurita, Callithrix flaviceps, Callithrix geoffroyi, Callithrix jacchus, Callithrix kuhlii, Callithrix penicillata, and their hybrids. This unique data set will be a valuable source of information on invasion risk assessments, biodiversity redistribution and conservation-related research. There are no copyright restrictions. Please cite this data paper when using the data in publications. We also request that researchers and teachers inform us on how they are using the data
Neotropical freshwater fisheries : A dataset of occurrence and abundance of freshwater fishes in the Neotropics
The Neotropical region hosts 4225 freshwater fish species, ranking first among the world's most diverse regions for freshwater fishes. Our NEOTROPICAL FRESHWATER FISHES data set is the first to produce a large-scale Neotropical freshwater fish inventory, covering the entire Neotropical region from Mexico and the Caribbean in the north to the southern limits in Argentina, Paraguay, Chile, and Uruguay. We compiled 185,787 distribution records, with unique georeferenced coordinates, for the 4225 species, represented by occurrence and abundance data. The number of species for the most numerous orders are as follows: Characiformes (1289), Siluriformes (1384), Cichliformes (354), Cyprinodontiformes (245), and Gymnotiformes (135). The most recorded species was the characid Astyanax fasciatus (4696 records). We registered 116,802 distribution records for native species, compared to 1802 distribution records for nonnative species. The main aim of the NEOTROPICAL FRESHWATER FISHES data set was to make these occurrence and abundance data accessible for international researchers to develop ecological and macroecological studies, from local to regional scales, with focal fish species, families, or orders. We anticipate that the NEOTROPICAL FRESHWATER FISHES data set will be valuable for studies on a wide range of ecological processes, such as trophic cascades, fishery pressure, the effects of habitat loss and fragmentation, and the impacts of species invasion and climate change. There are no copyright restrictions on the data, and please cite this data paper when using the data in publications
NEOTROPICAL XENARTHRANS: a data set of occurrence of xenarthran species in the Neotropics
Xenarthrans—anteaters, sloths, and armadillos—have essential functions for ecosystem maintenance, such as insect control and nutrient cycling, playing key roles as ecosystem engineers. Because of habitat loss and fragmentation, hunting pressure, and conflicts with domestic dogs, these species have been threatened locally, regionally, or even across their full distribution ranges. The Neotropics harbor 21 species of armadillos, 10 anteaters, and 6 sloths. Our data set includes the families Chlamyphoridae (13), Dasypodidae (7), Myrmecophagidae (3), Bradypodidae (4), and Megalonychidae (2). We have no occurrence data on Dasypus pilosus (Dasypodidae). Regarding Cyclopedidae, until recently, only one species was recognized, but new genetic studies have revealed that the group is represented by seven species. In this data paper, we compiled a total of 42,528 records of 31 species, represented by occurrence and quantitative data, totaling 24,847 unique georeferenced records. The geographic range is from the southern United States, Mexico, and Caribbean countries at the northern portion of the Neotropics, to the austral distribution in Argentina, Paraguay, Chile, and Uruguay. Regarding anteaters, Myrmecophaga tridactyla has the most records (n = 5,941), and Cyclopes sp. have the fewest (n = 240). The armadillo species with the most data is Dasypus novemcinctus (n = 11,588), and the fewest data are recorded for Calyptophractus retusus (n = 33). With regard to sloth species, Bradypus variegatus has the most records (n = 962), and Bradypus pygmaeus has the fewest (n = 12). Our main objective with Neotropical Xenarthrans is to make occurrence and quantitative data available to facilitate more ecological research, particularly if we integrate the xenarthran data with other data sets of Neotropical Series that will become available very soon (i.e., Neotropical Carnivores, Neotropical Invasive Mammals, and Neotropical Hunters and Dogs). Therefore, studies on trophic cascades, hunting pressure, habitat loss, fragmentation effects, species invasion, and climate change effects will be possible with the Neotropical Xenarthrans data set. Please cite this data paper when using its data in publications. We also request that researchers and teachers inform us of how they are using these data